Locally Determining the Number of Neighbors in the k-Nearest Neighbor Rule Based on Statistical Confidence
نویسندگان
چکیده
The k-nearest neighbor rule is one of the most attractive pattern classification algorithms. In practice, the value of k is usually determined by the cross-validation method. In this work, we propose a new method that locally determines the number of nearest neighbors based on the concept of statistical confidence. We define the confidence associated with decisions that are made by the majority rule from a finite number of observations and use it as a criterion to determine the number of nearest neighbors needed. The new algorithm is tested on several realworld datasets and yields results comparable to those obtained by the knearest neighbor rule. In contrast to the k-nearest neighbor rule that uses a fixed number of nearest neighbors throughout the feature space, our method locally adjusts the number of neighbors until a satisfactory level of confidence is reached. In addition, the statistical confidence provides a natural way to balance the trade-off between the reject rate and the error rate by excluding patterns that have low confidence levels.
منابع مشابه
Neighborhood size selection in the k-nearest-neighbor rule using statistical confidence
The k-nearest-neighbor rule is one of the most attractive pattern classification algorithms. In practice, the choice of k is determined by the cross-validation method. In this work, we propose a new method for neighborhood size selection that is based on the concept of statistical confidence. We define the confidence associated with a decision that is made by the majority rule from a finite num...
متن کاملA Statistical Confidence-Based Adaptive Nearest Neighbor Algorithm for Pattern Classification
The k-nearest neighbor rule is one of the simplest and most attractive pattern classification algorithms. It can be interpreted as an empirical Bayes classifier based on the estimated a posteriori probabilities from the k nearest neighbors. The performance of the k-nearest neighbor rule relies on the locally constant a posteriori probability assumption. This assumption, however, becomes problem...
متن کاملA New Hybrid Approach of K-Nearest Neighbors Algorithm with Particle Swarm Optimization for E-Mail Spam Detection
Emails are one of the fastest economic communications. Increasing email users has caused the increase of spam in recent years. As we know, spam not only damages user’s profits, time-consuming and bandwidth, but also has become as a risk to efficiency, reliability, and security of a network. Spam developers are always trying to find ways to escape the existing filters therefore new filters to de...
متن کاملOptimized Nearest Neighbor Methods Cam Weighted Distance & Statistical Confidence
Nearest neighbor classification methods are a useful and a relatively straightforward to implement classification technique. However, despite such appeal, they still suffer from the curse of dimensionality. Additionally, the nature of the data sets may not be wholly applicable to the model assumed in the nearest neighbor methods. As such there have been many proposed optimizations. Two such opt...
متن کاملEstimation of Density using Plotless Density Estimator Criteria in Arasbaran Forest
Sampling methods have a theoretical basis and should be operational in different forests; therefore selecting an appropriate sampling method is effective for accurate estimation of forest characteristics. The purpose of this study was to estimate the stand density (number per hectare) in Arasbaran forest using a variety of the plotless density estimators of the nearest neighbors sampling me...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005